Move pod jobs to parallel execution #7382

mheon · 2020-08-19T20:21:18Z

Make Podman pod operations that do not involve starting containers (which needs to be done in a specific order) use the same parallel operation code we use to make podman stop on large numbers of containers fast. We were previously stopping containers in a pod serially, which could take up to the timeout (default 15 seconds) for each container - stopping 100 containers that do not respond to SIGTERM would take 25 minutes.

To do this, refactor the parallel operation code a bit to remove its dependency on libpod (damn circular import restrictions...) and use parallel functions that just re-use the standard container API operations - maximizes code reuse (previously each pod handler had a separate implementation of the container function it performed).

This is a bit of a palate cleanser after fighting CI for two days - nice to be able to return to a land of sanity.

openshift-ci-robot · 2020-08-19T20:21:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mheon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jwhonce · 2020-09-17T17:17:05Z

pkg/parallel/parallelctr/ctr.go

@@ -0,0 +1,42 @@
+package parallelctr


Why stutter name on enclosing directory and file?

Because I am not very creative and could not think of a better name.

Make Podman pod operations that do not involve starting containers (which needs to be done in a specific order) use the same parallel operation code we use to make `podman stop` on large numbers of containers fast. We were previously stopping containers in a pod serially, which could take up to the timeout (default 15 seconds) for each container - stopping 100 containers that do not respond to SIGTERM would take 25 minutes. To do this, refactor the parallel operation code a bit to remove its dependency on libpod (damn circular import restrictions...) and use parallel functions that just re-use the standard container API operations - maximizes code reuse (previously each pod handler had a separate implementation of the container function it performed). This is a bit of a palate cleanser after fighting CI for two days - nice to be able to return to a land of sanity. Signed-off-by: Matthew Heon <[email protected]>

Ensure that we actually print the output of all commands when cleaning up the results of the E2E tests. Signed-off-by: Matthew Heon <[email protected]>

mheon · 2020-10-07T14:00:29Z

This should (finally) go green now

mheon · 2020-10-07T16:26:52Z

It's green. @baude @rhatdan @vrothberg @TomSweeneyRedHat @jwhonce PTAL

TomSweeneyRedHat · 2020-10-07T17:50:10Z

pkg/parallel/parallel.go

+	}()
+
+	return retChan
+}


Really neat function. At first I was looking all over above for the channel close.

TomSweeneyRedHat · 2020-10-07T17:50:35Z

LGTM

baude · 2020-10-07T19:01:46Z

/lgtm

openshift-ci-robot requested review from giuseppe and jwhonce August 19, 2020 20:21

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 19, 2020

mheon force-pushed the pod_parallel branch 4 times, most recently from 93b045a to 51d10eb Compare August 20, 2020 15:10

mheon force-pushed the pod_parallel branch from 51d10eb to 2992947 Compare August 28, 2020 18:48

mheon force-pushed the pod_parallel branch 2 times, most recently from 51b555a to 2684f0f Compare September 17, 2020 17:08

jwhonce reviewed Sep 17, 2020

View reviewed changes

mheon force-pushed the pod_parallel branch from 2684f0f to cd30722 Compare September 17, 2020 17:59

mheon force-pushed the pod_parallel branch 4 times, most recently from 230a495 to e2c8595 Compare October 6, 2020 23:24

mheon and others added 2 commits October 7, 2020 10:00

Use WaitWithDefaultTimeout in cleanup

55f5e4a

Ensure that we actually print the output of all commands when cleaning up the results of the E2E tests. Signed-off-by: Matthew Heon <[email protected]>

mheon force-pushed the pod_parallel branch from e2c8595 to 55f5e4a Compare October 7, 2020 14:00

TomSweeneyRedHat reviewed Oct 7, 2020

View reviewed changes

openshift-ci-robot assigned baude Oct 7, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 7, 2020

openshift-merge-robot merged commit 0e1d011 into containers:master Oct 7, 2020

edsantiago mentioned this pull request Oct 19, 2020

Cirrus CI: agent stopped responding #8068

Closed

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 24, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move pod jobs to parallel execution #7382

Move pod jobs to parallel execution #7382

mheon commented Aug 19, 2020

openshift-ci-robot commented Aug 19, 2020

jwhonce Sep 17, 2020

mheon Sep 17, 2020

mheon commented Oct 7, 2020

mheon commented Oct 7, 2020

TomSweeneyRedHat Oct 7, 2020

TomSweeneyRedHat commented Oct 7, 2020

baude commented Oct 7, 2020

Move pod jobs to parallel execution #7382

Move pod jobs to parallel execution #7382

Conversation

mheon commented Aug 19, 2020

openshift-ci-robot commented Aug 19, 2020

jwhonce Sep 17, 2020

Choose a reason for hiding this comment

mheon Sep 17, 2020

Choose a reason for hiding this comment

mheon commented Oct 7, 2020

mheon commented Oct 7, 2020

TomSweeneyRedHat Oct 7, 2020

Choose a reason for hiding this comment

TomSweeneyRedHat commented Oct 7, 2020

baude commented Oct 7, 2020